{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hackathon 6\n",
    "\n",
    "Written by Eleanor Quint\n",
    "\n",
    "Topics:\n",
    "- Techniques for dimension expansion\n",
    "    - Transpose convolutions\n",
    "    - Sub-pixel convolutions\n",
    "    - ProgressiveGAN upscaling\n",
    "- Autoencoding\n",
    "    - Sparse autoencoders\n",
    "    - De-noising autoencoders\n",
    "\n",
    "This is all setup in a IPython notebook so you can run any code you want to experiment with. Feel free to edit any cell, or add some to run your own code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# We'll start with our library imports...\n",
    "from __future__ import print_function\n",
    "\n",
    "import os  # to work with file paths\n",
    "\n",
    "import tensorflow as tf         # to specify and run computation graphs\n",
    "import numpy as np              # for numerical operations taking place outside of the TF graph\n",
    "import matplotlib.pyplot as plt # to draw plots\n",
    "\n",
    "mnist_dir = '/work/cse496dl/shared/hackathon/03/mnist/'\n",
    "cifar_dir = '/work/cse496dl/shared/hackathon/05/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# extract our dataset, MNIST\n",
    "train_data = np.load(mnist_dir + 'mnist_train_images.npy')\n",
    "train_data = np.reshape(train_data, [-1, 28, 28, 1])\n",
    "test_data = np.load(mnist_dir + 'mnist_test_images.npy')\n",
    "test_data = np.reshape(test_data, [-1, 28, 28, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Techniques for dimension expansion\n",
    "\n",
    "Generally we compress high dimensional representations into lower dimensional ones. Now, we're going to study ways of going from lower dimensional to higher. For this, we're going to define a function `upscale_block` in three different ways.\n",
    "\n",
    "#### Transpose convolutions\n",
    "\n",
    "Although we can reverse linear transformations very easily, images are less straightforward to upscale. Aside from naive, classical techniques for upscaling, we can learn to increase the size of images with transpose convolutions. They are sometimes called \"deconvolutions\" because they're the inverse operation of the convolution, but it is actually the transpose (gradient) of conv2d rather than an actual deconvolution. Transpose convolutions are implemented by [tf.layers.conv2d_transpose](https://www.tensorflow.org/api_docs/python/tf/layers/conv2d_transpose)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def upscale_block(x, scale=2):\n",
    "    \"\"\" conv2d_transpose \"\"\"\n",
    "    return tf.layers.conv2d_transpose(x, 3, 3, strides=(scale, scale), padding='same', activation=tf.nn.relu)\n",
    "\n",
    "tf.reset_default_graph()\n",
    "x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])\n",
    "up_x = upscale_block(x)\n",
    "print(up_x)\n",
    "num_params = np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])\n",
    "print('Parameters: ' + str(num_params))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Sub-pixel convolutions\n",
    "\n",
    "Another approach is called the sub-pixel convolution, which does a convolution with many channels, and then re-orders the data into the height and width dimensions from the channels dimension:\n",
    "\n",
    "<img src=\"https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/03a5b2aac53443e6078f0f63b35d4f95d6d54c5d/2-Figure1-1.png\">\n",
    "\n",
    "(Image sourced from [Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network](https://arxiv.org/abs/1609.05158))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def upscale_block(x, scale=2):\n",
    "    \"\"\" [Sub-Pixel Convolution](https://arxiv.org/abs/1609.05158) \"\"\"\n",
    "    n, w, h, c = x.get_shape().as_list()\n",
    "    x = tf.layers.conv2d(x, c * scale ** 2, (3, 3), activation=tf.nn.relu, padding='same')\n",
    "    output = tf.depth_to_space(x, scale)\n",
    "    return output\n",
    "    \n",
    "tf.reset_default_graph()\n",
    "x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])\n",
    "x_up = upscale_block(x)\n",
    "print(x_up)\n",
    "num_params = np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])\n",
    "print('Parameters: ' + str(num_params))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ProgressiveGAN upscaling\n",
    "\n",
    "Finally, one technique that's recently found massive success in Nvidia's ProgressiveGAN used to generate high-resolution fake celebrity faces:\n",
    "\n",
    "<img src=\"https://i2.wp.com/robotnyheter.se/wp-content/uploads/2018/01/Nvidia_GAN_ansikten.jpg?w=1561\" width=\"70%\">\n",
    "\n",
    "None of these are real photos, they've all been upsampled from Gaussian noise with this technique"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def upscale_block(x, scale=2):\n",
    "    \"\"\" similar to the upsampling used in [ProgressiveGAN](https://arxiv.org/pdf/1710.10196.pdf) \"\"\"\n",
    "    n, w, h, c = x.get_shape().as_list()\n",
    "    up_x = tf.image.resize_nearest_neighbor(x, [scale*h, scale*w])\n",
    "    output = tf.layers.conv2d(up_x, 3, 3, padding='same', activation=tf.nn.relu)\n",
    "    return output\n",
    "\n",
    "tf.reset_default_graph()\n",
    "x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])\n",
    "x_up = upscale_block(x)\n",
    "print(x_up)\n",
    "num_params = np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])\n",
    "print('Parameters: ' + str(num_params))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Autoencoding\n",
    "\n",
    "Generally, autoencoding is learning \"a complicated identity function\". This makes it a form of unsupervised learning, which doesn't require data to be explicitly labeled, but instead looks for patterns and trends in data. Typically the complication is to bottleneck the size of the representation, but can also be more varied. We'll look at code for sparse autoencoders and de-noising autoencoders.\n",
    "\n",
    "First we'll define some preliminaries that we'll use in both architectures:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def upscale_block(x, scale=2):\n",
    "    \"\"\"transpose convolution upscale\"\"\"\n",
    "    return tf.layers.conv2d_transpose(x, 1, 3, strides=(scale, scale), padding='same', activation=tf.nn.relu)\n",
    "\n",
    "def downscale_block(x, scale=2):\n",
    "    n, h, w, c = x.get_shape().as_list()\n",
    "    return tf.layers.conv2d(x, np.floor(c * 1.25), 3, strides=scale, padding='same')\n",
    "\n",
    "def autoencoder_network(x, code_size=100):\n",
    "    \"\"\"This network assumes [?, 28, 28, ?] shaped input\"\"\"\n",
    "    encoder_14 = downscale_block(x)\n",
    "    encoder_7 = downscale_block(encoder_14)\n",
    "    flatten_dim = np.prod(encoder_7.get_shape().as_list()[1:])\n",
    "    flat = tf.reshape(encoder_7, [-1, flatten_dim])\n",
    "    code = tf.layers.dense(flat, code_size, activation=tf.nn.relu)\n",
    "    hidden_decoder = tf.layers.dense(code, 49, activation=tf.nn.elu)\n",
    "    decoder_7 = tf.reshape(hidden_decoder, [-1, 7, 7, 1])\n",
    "    decoder_14 = upscale_block(decoder_7)\n",
    "    output = upscale_block(decoder_14)\n",
    "    return code, output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Sparse autoenoding\n",
    "\n",
    "Although we bottleneck the representation in normal autoencoding by reducing the dimensionality, sparse autoencoders can actually increase it, but restrict it to be sparsely activated with L1 regularization using [tf.norm](https://www.tensorflow.org/api_docs/python/tf/norm) or KL-divergence. This has the effect of only having non-zero values in a few dimensions, effectively bottlenecking each representation, but giving a greater variety of dimensions to choose to be used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# set hyperparameters\n",
    "sparsity_weight = 5e-3\n",
    "code_size = 100\n",
    "\n",
    "# define graph\n",
    "tf.reset_default_graph()\n",
    "x = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])\n",
    "code, outputs = autoencoder_network(x, code_size)\n",
    "\n",
    "# just for fun\n",
    "num_params = np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])\n",
    "print('Parameters: ' + str(num_params))\n",
    "\n",
    "# calculate loss\n",
    "sparsity_loss = tf.norm(code, ord=1, axis=1)\n",
    "reconstruction_loss = tf.reduce_mean(tf.square(outputs - x)) # Mean Square Error\n",
    "total_loss = reconstruction_loss + sparsity_weight * sparsity_loss\n",
    "\n",
    "# setup optimizer\n",
    "optimizer = tf.train.AdamOptimizer()\n",
    "train_op = optimizer.minimize(total_loss)\n",
    "\n",
    "# train for an epoch\n",
    "batch_size = 16\n",
    "session = tf.Session()\n",
    "session.run(tf.global_variables_initializer())\n",
    "for epoch in range(1):\n",
    "    for i in range(train_data.shape[0] // batch_size):\n",
    "        batch_xs = train_data[i*batch_size:(i+1)*batch_size, :]\n",
    "        session.run(train_op, {x: batch_xs})\n",
    "print(\"Done!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# run a test\n",
    "idx = np.random.randint(test_data.shape[0])\n",
    "x_out, code_out, output_out = session.run([x, code, outputs], {x: np.expand_dims(test_data[idx], axis=0)})\n",
    "print(\"Code and Input\")\n",
    "plt.imshow(np.squeeze(x_out))\n",
    "print(code_out)\n",
    "print(\"Number of nonzero code dimensions: {}/{}\".format(np.count_nonzero(code_out), code_size))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Show reconstruction\n",
    "# This is in a different cell from the last plt.imshow call to show 2 images at once\n",
    "print(\"Reconstruction\")\n",
    "plt.imshow(np.squeeze(output_out))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Denoising Autoencoder\n",
    "\n",
    "Another way to force the an autoencoder to learn the features of data is to force it to map noisy, corrupted versions of the data back to the original. This is usually accomplished by manually adding noice (e.g., Gaussian), but may also be useful in real world settings.\n",
    "\n",
    "The noise level could be scaled up as training proceeds to implement a form of [curriculum learning](https://ronan.collobert.com/pub/matos/2009_curriculum_icml.pdf)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# hyperparameters\n",
    "noise_level = 0.1\n",
    "code_size = 40\n",
    "\n",
    "# define graph\n",
    "tf.reset_default_graph()\n",
    "x = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])\n",
    "x_noisy = x + noise_level * tf.random_normal(tf.shape(x))\n",
    "code, outputs = autoencoder_network(x_noisy, code_size=code_size)\n",
    "\n",
    "# calculate loss\n",
    "reconstruction_loss = tf.reduce_mean(tf.square(outputs - x)) # MSE\n",
    "total_loss = reconstruction_loss # just for consistency\n",
    "\n",
    "# setup optimizer\n",
    "optimizer = tf.train.AdamOptimizer()\n",
    "train_op = optimizer.minimize(total_loss)\n",
    "\n",
    "# train for an epoch and visualize\n",
    "batch_size = 16\n",
    "session = tf.Session()\n",
    "session.run(tf.global_variables_initializer())\n",
    "for epoch in range(2):\n",
    "    for i in range(train_data.shape[0] // batch_size):\n",
    "        batch_xs = train_data[i*batch_size:(i+1)*batch_size, :]\n",
    "        session.run(train_op, {x: batch_xs})\n",
    "print(\"Done!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# run a test\n",
    "idx = np.random.randint(test_data.shape[0])\n",
    "x_out, noisy_x_out, code_out, output_out = session.run([x, x_noisy, code, outputs], {x: np.expand_dims(test_data[idx], axis=0)})\n",
    "print(\"Input\")\n",
    "plt.imshow(np.squeeze(x_out))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# show the noised input and code\n",
    "plt.imshow(np.squeeze(noisy_x_out))\n",
    "print(\"Code and Noised Input\")\n",
    "print(code_out)\n",
    "print(\"Number of nonzero code dimensions: {}/{}\".format(np.count_nonzero(code_out), code_size))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Show reconstruction\n",
    "# This has to be in a different cell from the last plt.imshow call to show 2 images at once\n",
    "print(\"Reconstruction\")\n",
    "plt.imshow(np.squeeze(output_out))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hackathon 6 Exercise\n",
    "\n",
    "Write code specifying an autoencoder with a code of shape `[None, 4, 4, 1]` with the provided input placeholder (corresponding to a color image like CIFAR-10). Specify a reasonable autoencoding loss function so that the given optimizer code runs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "tf.reset_default_graph()\n",
    "x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])\n",
    "# your code here\n",
    "\n",
    "# if your code works, this should have no problem\n",
    "optimizer = tf.train.AdamOptimizer()\n",
    "train_op = optimizer.minimize(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Coda\n",
    "\n",
    "#### [Progressive GAN latent space interpolation on Youtube](https://youtu.be/XOxxPcy5Gr4?t=1m48s)\n",
    "\n",
    "#### [How to Use t-SNE Effectively (distill.pub)](https://distill.pub/2016/misread-tsne/)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "TensorFlow 1.12 (py36)",
   "language": "python",
   "name": "tensorflow-1.12-py36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
